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Abstract 

We study classic streaming and sparse recovery problems using deterministic linear sketches, includ- 
ing iy/iy and £oo/&i sparse recovery problems (the latter also being known as ^i-heavy hitters), norm 
estimation, and approximate inner product. We focus on devising a fixed matrix A G R mxn anc i a (j e _ 
terministic recovery /estimation procedure which work for all possible input vectors simultaneously. Our 
results improve upon existing work, the following being our main contributions: 

• A proof that £oc/£i sparse recovery and inner product estimation are equivalent, and that incoher- 
ent matrices can be used to solve both problems. Our upper bound for the number of measurements 
is m = 0(e~ 2 min{log n, (log n/ log(l/e)) 2 }). We can also obtain fast sketching and recovery algo- 
rithms by making use of the Fast Johnson-Lindenstrauss transform. Both our running times and 
number of measurements improve upon previous work. We can also obtain better error guarantees 
than previous work in terms of a smaller tail of the input vector. 

• A new lower bound for the number of linear measurements required to solve £\ j£\ sparse recovery. 
We show £l(k/e 2 + felog(n/fe)/e) measurements are required to recover an x' with \\x — x'\\i < 
(1 + e)||a;ta«(fc)||i, where x ta u(k) is x projected onto all but its largest k coordinates in magnitude. 

• A tight bound of m — 0(e -2 log(e 2 n)) on the number of measurements required to solve determin- 
istic norm estimation, i.e., to recover ||x||2 ±e||a;||i. 

For all the problems we study, tight bounds are already known for the randomized complexity from 
previous work, except in the case of £i/£i sparse recovery, where a nearly tight bound is known. Our 
work thus aims to study the deterministic complexities of these problems. 

1 Introduction 

In this work we provide new results for the point query problem as well as several other related prob- 
lems: approximate inner product, t\/ix sparse recovery, and deterministic norm estimation. For many of 
these problems efficient randomized sketching and streaming algorithms exist, and thus we are interested in 
understanding the deterministic complexities of these problems. 



1.1 Applications 

Here we give a motivating application of the point query problem; for a formal definition of the problem, see 
below. Consider k servers S 1 , . . . , S , each holding a database D 1 , . . . , D k , respectively. The servers want 
to compute statistics of the union D of the k databases. For instance, the servers may want to know the 
frequency of a record or attribute-pair in D. It may be too expensive for the servers to communicate their 
individual databases to a centralized server, or to compute the frequency exactly. Hence, the servers wish to 
communicate a short summary or "sketch" of their databases to a centralized server, who can then combine 
the sketches to answer frequency queries about D. 
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We model the databases as vectors x % £ K". To compute a sketch of x % , we compute Ax % for a matrix A 
with m rows and n columns. Importantly, m <^ n, and so Ax 1 is much easier to communicate than x l . The 
servers compute Ax 1 , . . . ,Ax k , respectively, and transmit these to a centralized server. Since A is a linear 
map, the centralized server can compute Ax for x = cix 1 + . . . CkX k for any real numbers ci, . . . , Cfe. Notice 
that the Ci are allowed to be both positive and negative, which is crucial for estimating the frequency of record 
or attribute-pairs in the difference of two datasets, which allows for tracking which items have experienced a 
sudden growth or decline in frequency. This is useful in network anomaly detection [TTJ [IH |25l |33l [39] , and 
also for transactional data [17] . It is also useful for maintaining the set of frequent items over a changing 
database relation fTT] . 

Associated with A is an output algorithm Out which given Ax, outputs a vector x' for which ||x' — x||oo < 
s\\xtaii(k) II i for some number k, where x ta u(k) denotes the vector x with the top k entries in absolute value 
replaced with (the other entries being unchanged). Thus x' approximates x well on every coordinate. We 
call the pair {A, Out) a solution to the point query problem. Given such a matrix A and an output algorithm 
Out, the centralized server can obtain an approximation to the value of every entry in x, which depending 
on the application, could be the frequency of an attribute-pair. It can also, e.g., extract the maximum 
frequencies of x, which are useful for obtaining the most frequent items. The centralized server obtains an 
entire histogram of values of coordinates in x, which is a useful low-memory representation of x. Notice that 
the communication is mk words, as opposed to nk if the servers were to transmit x 1 , . . . ,x n . 

Our correctness guarantees hold for all input vectors simultaneously using one fixed (A, Out) pair, and 
thus it is stronger and should be contrasted with the guarantee that the algorithm succeeds given Ax with 
high probability for some fixed input x. For example, for the point query problem, the latter guarantee is 
achieved by the CountMin sketch [TB] or CountSketch [TJ] . One of the reasons the randomized guarantee is 
less useful is because of adaptive queries. That is, suppose the centralized server computes x' and transmits 
information about x' to S 1 , . . . , S k . Since x' could depend on A, if the servers were to then use the same 
matrix A to compute sketches Ay 1 , . . . , Ay k for databases y 1 , . . . ,y k which depend on x' , then A need not 
succeed, since it is not guaranteed to be correct with high probability for inputs y 1 which depend on A. 

1.2 Notation and Problem Definitions 

Throughout this work [n] denotes {1, . . . ,n}. For q a prime power, ¥ q denotes the finite field of size q. For 
x £ M™ and S C [n], xs denotes the vector with (xs)i = x.i for i £ S, and (xs)i = for i ^ S. The notation 
X-i is shorthand for X[„]\{.j}. For a matrix A £ W nxn and integer i £ [n], Ai denotes the ith column of A. 
For matrices A and vectors x, A T and x T denote their transposes. For x £ l n and integer k < n, we let 
head(x, k) C [n] denote the set of k largest coordinates in x in absolute value, and tail{x, k) = [n]\head{x, k). 
We often use Xh ea d(k) to denote Xh ea d(x,k)-, and similarly for the tail. For real numbers a,b,e > 0, we use the 
notation a = (1 ± e)b to convey that a £ [(1 — e)b, (1 + e)b}. A collection of vectors {C\, . . . , C n } £ [(?]* is 
called a code with alphabet size q and block length t, and we define A(Ci,Cj) = \{k : (Ci)k 7^ (Cj)fc}|- The 
relative distance of the code is max^j A(Ci, Cj)/t. 

We now define the problems that we study in this work. In all these problems there is some error 
parameter < e < 1/2, and we want to design a fixed matrix A £ K mx ™ and deterministic algorithm Out 
for each problem satisfying the following. 

Problem 1: In the ioo/^i recovery problem, also called the point query problem, Vx £ R™, x' = Out(Ax) 
satisfies | x — x'Hoo < £r||x||i. The pair (^4, Out) furthermore satisfies the k-tail guarantee if actually ||x — 
x'Woc < £\\x t aii(k) Ik- 
Problem 2: In the inner product problem, Vx,y £ R™, a = Out{Ax, Ay) satisfies \a— (x,y) \ < e||x||i||y||i. 

Problem 3: In the £i/£i recovery problem with the k-tail guarantee, Vx £ W, x' — Out(Ax) satisfies 
Hx-x'Hi < (1 + e)||x tai;(fc) ||i. 
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Problem 4: In the £2 norm estimation problem, Vx <G M™, a = Out(Ax) satisfies |||x||2 — ct\ < e|M|i- 

We note that for the first, second, and fourth problems above, our errors are additive and not relative. 
This is because relative error is impossible to achieve with a sublinear number of measurements. If A is 
a fixed matrix with m < n, then it has some non-trivial kernel. Since for all the problems above an Out 
procedure would have to output when Ax = to achieve bounded relative approximation, such a procedure 
would fail on any input vector in the kernel which is not the vector. 

For Problem 2 one could also ask to achieve additive error e||x||p||j/|| p for p > 1. For y = e, for a standard 
unit vector ej, this would mean approximating Xi up to additive error e||a:||p. This is not possible unless 
to = n(n 2 - 2 /P) for 1< p < 2 and to = Q(n) for p > 2 [55]. 

For Problem 3, it is known that the analogous guarantee of returning x' for which — x'\\2 < sll^taiiffe) II2 
is not possible unless to = f2(n) [15] . 

1.3 Our Contributions and Related Work 

We study the four problems stated above, where we have the deterministic guarantee that a single pair 
(A, Out) provides the desired guarantee for all input vectors simultaneously. We first show that point query 
and inner product are equivalent up to changing e by a constant factor. We then show that any "incoherent 
matrix" A can be used for these two problems to perform the linear measurements; that is, a matrix A whose 
columns have unit £2 norm and such that each pair of columns has dot product at most e in magnitude. 
Such matrices can be obtained from the Johnson-Lindenstrauss ( JL) lemma [30] , almost pairwise independent 
sample spaces [71 140] . or error-correcting codes, and they play a prominent role in compressed sensing [19U38] 
and mathematical approximation theory [26] . The connection between point query and codes was implicit 
in [33], though a suboptimal code was used, and the observation that the more general class of incoherent 
matrices suffices is novel. This connection allows us to show that to = Oie~ 2 minjlogn, (logn/ log(l/e)) 2 }) 
measurements suffice, and where Out and the construction of A are completely deterministic. Alon has 
shown that any incoherent matrix must have to = Vt(e~ 2 logn/ log(l/e)) [6]. Meanwhile the best known 
lower bound for point query is to = fl(e~ 2 + e _1 log(en)) [20|, 1211 128] . and the previous best known upper 
bound was to = 0(e~ 2 log 2 n/(log 1/e + log logn)) [23]. If the construction of A is allowed to be Las Vegas 
polynomial time, then we can use the Fast Johnson-Lindenstrauss transforms [5] [3] 2] [35] so that Ax can 
be computed quickly, e.g. in O(nlogm) time as long as to < n 1 / 2-7 [3J, and with m = 0(er _2 logn). Our 
Out algorithm is equally fast. We also show that for point query, if we allow the measurement matrix A 
to be constructed by a polynomial Monte Carlo algorithm, then the l/e 2 -tail guarantee can be obtained 
essentially "for free", i.e. by keeping m = 0(e~ 2 logn). Previously the work [23] only showed how to obtain 
the 1/e-tail guarantee "for free" in this sense of not increasing to (though the to in |23j was worse). We note 
that for randomized algorithms which succeed with high probability for any given input, it suffices to take 
m = 0(e^ 1 logn) by using the CountMin data structure [16], and this is optimal [31] (the lower bound in 
[31] is stated for the so-called heavy hitters problem, but also applies to the £00 /£\ recovery problem). 

For the £\/£\ sparse recovery problem with the /c-tail guarantee, we show a lower bound of m = 
n(k\og(en/k)/e + k/e 2 ). The best upper bound is 0(k \og(n/k)/e 2 ) [29]. Our lower bound implies a 
separation for the complexity of this problem in the case that one must simply pick a random (A, Out) 
pair which works for some given input x with high probability (i.e. not for all x simultaneously), since [41] 
showed an to = 0(fclognlog 3 (l/e)/V^) upper bound in this case. The first summand of our lower bound 
uses techniques used in [9] [41] . The second summand uses a generalization of an argument of Gluskin [28] , 
which was later rediscovered by Ganguly [21], which showed the lower bound m = fi(l/e 2 ) for point query. 

Finally, we show how to devise an appropriate [A, Out) for £2 norm estimation with to = 0{e~ 2 log(e 2 n)), 
which is optimal. The construction of A is randomized but then works for all x with high probability. The 
proof takes A according to known upper bounds on Gclfand widths, and the recovery procedure Out requires 
solving a simple convex program. As far as we are aware, this is the first work to investigate this problem 
in the deterministic setting. In the case that (A, Out) can be chosen randomly to work for any fixed x with 
high probability, one can use the AMS sketch [5] with to = 0(e~ 2 log(l/5)) to succeed with probability 1 — 6 
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and to obtain the better guarantee £||x||2- The AMS sketch can also be used for the inner product problem 
to obtain error guarantee £||x||2||y||2 with the same m. 



2 Point Query and Inner Product Estimation 



We first show that the problems of point query and inner product estimation are equivalent up to changing 
the error parameter e by a constant factor. 

Theorem 1. Any solution (A, Out') to inner product estimation with error parameter e yields a solution 
{A, Out) to the point query problem with error parameter e. Also, a solution (A, Out) for point query with 
error e yields a solution (A, Out') to inner product with error 12e. The time complexities of Out and Out' 
are equal up to poly(n) factors. 

Proof: Let (A, Out') be a solution to the inner product problem such that Out' (Ax, Ay) = (x,y) ± 
^ll^llillylli- Then given x € R™, to solve the point query problem we return the vector with Out(Ax)i = 
Out' (Ax, Aei), and our guarantees are immediate. 

Now let (A, Out) be a solution to the point query problem. Then given x, y € R n , let x' = Out(Ax), y' = 

Out(Ay). Our estimate for the inner product is Out'(Ax, Ay) = ^/, e0( j(i/ e ) i v'head(i/e))- Observe the 
following: any coordinate i with |x^| > 2e||o;||i must have \xi\ > e||x||i, and thus there are at most 1/e 
such coordinates. Also, any i with \xi\ > 3e||a;||i will have |a^| > 2e||x||i. Thus, {i : \xi\ > 3e||x||i} C 
head(x' , 1/e), and similarly for x replaced with y. Now, 



x head(l/s) > Vhead(l/e) 



< 



x head(l/s)>yhead(l/s) ) ~ ( x head(x' ,1 / e) j Vhead(y',l/e) ) 
+ \{ x head(x' ,l/e)-,ytail(y' ,l/e))\ + | ( x tail(x' ,1/e) j Vhead(y' ,1/e)) \ 
+ \ ( x tail(x',l/e),ytail(y',l/e))\ 



We can bound 



by 



' : 'head{l/e)>y'head{l/e)) ~ ( x head(x' ,1/e) j Vhead{y' ,1/e) ) 



4vh x i + E £ Nliw + 7 ' £2 ll^llillylli < 3e||*||i||y||i. 



i£head(x' ,1/ e) 

We can also bound 



i£head(x' ,1/ e) 



| (^headix* 1 1 / e) ) Utail(y' , 

+ \( x tail(x',l/e),yhead(y'J/e))\ < \\ x \\ 1 1| VtailW ,1/e) II oo + \\ x tail(x' ,1/e) \\ oo \\y \\ 

Finally we have the bound 

\{ x tail(x',l/e), Vtail{y' ,1/e)) \ < II x tail(x> ,1/e) II 2 1| ytail(y' ,1/e) II 2 ■ 



(1) 



Since ||xt ii(x',i/e)||ao < 3e||a;||i and \\x ta ti(x> ,i/e)\\i < ||x||i, we have that \\x ta ii(x' ,i/e)h is maximized when 
it has exactly l/(3e) coordinates each of value exactly 3e||x||i, which yields £2 norm \/3e||x||i, and similarly 
for x replaced with y. Thus the right hand side of Eq. (TTJ) is bounded by 3£||x||i||?/||i. Thus in summary, our 
total error in inner product estimation is 12e||x||i||y||i. 

Since the two problems are equivalent up to changing e by a constant factor, we focus on the point query 
problem. We first show that any incoherent matrix A has a correct associated output procedure Out. By an 
incoherent matrix, we mean an m x n matrix A for which all columns At of A have unit £2 norm, and for all 
i 7^ j we have | (Aj, Aj) | < e. We have the following lemma. 
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Lemma 2. Any incoherent matrix A with error parameter e has an associated poly(mn)-time deterministic 
recovery procedure Out for which (A, Out) is a solution to the point query problem. In fact, for any x £ W 1 , 
given Ax and i G [n], the output x\ satisfies \x\ — Xi\ < £||a;_j||i. 

Proof: Let x £ M" be arbitrary. We define Out(Ax) = A T Ax. Observe that for any i £ [n], we have 

n 

x\ = AjAx = ^{AuA^Xj = Xi±s\\x-i\\i. 

3 = 1 



It is known that any incoherent matrix has m = f£((log n)/(s 2 log 1/e)) [6], and the JL lemma implies 
such matrices with m = 0((logn)/e 2 ) [30|. For example, there exist matrices in {— l/^/m, l/y/rri} mxn 
satisfying this property [T|, which can also be found in poly(n) time [55J (we note that [55] gives running 
time exponential in precision, but the proof holds if the precision is taken to be 0(log(n/e)). It is also known 
that incoherent matrices can be obtained from almost pairwise independent sample spaces [3 I40j or error- 
correcting codes, and thus these tools can also be used to solve the point query problem. The connection to 
codes was already implicit in [23], though the code used in that work is suboptimal, as we will show soon. 
Below we elaborate on what bounds these tools provide for incoherent matrices, and what they imply for 
the point query problem. 

Incoherent matrices from JL: The upside of the connection to the JL lemma is that we can obtain 
matrices A for the point query problem such that Ax can be computed quickly, via the Fast Johnson- 
Lindcnstrauss Transform introduced by Ailon and Chazellc [2] or related subsequent works. The JL lemma 
states the following. 

Theorem 3 (JL lemma). For any xi,...,Xn £ K" and any < e < 1/2, there exists A £ K mxn with 
m = 0(e~ 2 log N) such that for all i,j £ [N] we have \\Axi — Arjlb = (1 ± e)\\xi — Xj\\2- 

Consider the matrix A obtained from the JL lemma when the set of vectors is {0, ei, . . . , e n } £ M. n . Then 
columns Ai of A have £2 norm 1 ± e, and furthermore for i 7^ j we have | (Ai, Aj) \ = (\\Ai — Aj\\?, — \\A\\f — 
||A|||)/2 = ((1 ± e) 2 2 - (1 ±e) - (1 ± e))/2 <2e + e 2 /2. By scaling each column to have £ 2 norm exactly 1, 
we still preserve that dot products between pairs of columns are 0(e) in magnitude. 

Incoherent matrices from almost pairwise independence: Next we elaborate on the connection 
between incoherent matrices and almost pairwise independence. 

Definition 4. An e-almost fc-wise independent sample space is a set S C { — 1, 1}" satisfying the following. 
For any T C [n], \T\ = k, the £1 distance between the uniform distribution over {—1, l} fe and the distribution 
of x(T) when x is drawn uniformly at random from S is at most e. Here x(T) £ { — 1, 1}' T ' is the bitstring 
x projected onto the coordinates in T . 

Note that if S is e-almost fc-wise independent, then for any \T\ = k, \ E^es Yii^r Xi \ — £ - Therefore if 
we choose k — 2 and form a \S\ x n matrix where the rows of A are the elements of S, divided by a scale 
factor of \ZfjS|, then A is incoherent. Known constructions of almost pairwise independent sample spaces 
give l^l = poly(e _1 logn) [2l[T2l|40]. We do not delve into the specific bounds on l^l since they yield worse 
results than the JL-based construction above. The probabilistic method implies that such an S exists with 
S = 0(e~ 2 logn), matching the JL construction, but an explicit almost pairwise independent sample space 
with this size is currently not known. 
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m 
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0((n log n)/e 2 ) 


0(e~ 2 logn) 
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yes 


0((n log n)/e) 


0(e~ 2 logn) 


sparse JL [32] , GV code 


no 
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0(d 2 /e 2 ) 


Reed- Solomon code 


yes 
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0(e~ 2 logn) 


FFT-bascd JL [3J 


no 


0(n logn) 


0( £ - 2 log b n) 


FFT-bascd JL H |35] 


no 



Figure 1: Implications for point query from JL matrices and codes. Time indicates the running time 
to compute Ax given x. In the case of Reed-Solomon, d — O (logn/ (log logn + log(l/e))). We say the 
construction is "explicit" if A can be computed in deterministic time poly(n); otherwise we only provide a 
polynomial time Las Vegas algorithm to construct A. 

Incoherent matrices from codes: Finally we explain the connection between incoherent matrices and 
codes. A connection to balanced binary codes was made in [5J, and to arbitrary codes over larger alphabets 
without detail in a remark in [5j. Though not novel, we elaborate on this latter connection for the sake of 
completeness. Let C = {Ci, . . . ,C n } be a code with alphabet size q, block length t, and relative distance 
1 — e. The fact that such a code gives rise to a matrix A € R™xr» f or p 0m ^ q Uerv with error parameter 
e was implicit in [33], but we make it explicit here. We let m = qt and conceptually partition the rows of 
A arbitrarily into t sets each of size q. For the column Ai, let (Ai)j t k denote the entry of Aj in the kth 
coordinate of the jth block. We set (AAj^ = 1/Vl if (Cj)j = k, and [A^j^ — otherwise. Said differently, 
for y = Ax we label the entries of y with double- indices G [t] x [q\. We define deterministic hash 

functions hi,...,ht ■ [n] — s- [q] by hi(J) = (Cj)i, and we set yij — ^2k-hi(k)=j X k/Vt- Our procedure Out 
produces a vector x' with x' k — X^j=i Vi,hi{k)- Each column has exactly t non-zero entries of value \ j\fi, and 
thus has £2 norm 1. Furthermore, for i 7^ j, (Ai, Aj) = (t — A(Ci, Cj))/t < e. 

The work |23] instantiated the above with the following Chinese remainder code [361 1441 146] . Let pi < 
. . . < pt be primes, and let q = pt- We let (Ci)j — i mod pj. To obtain n codewords with relative distance 
1 — e, this construction required setting t = 0(e _1 logn/(log(l/e) + log logn)) and pi,pt = 6(e _1 logn) = 
0(t\ogt). The proof uses that for i,j G [n], \i — j\ has at most log pi n prime factors greater than or 
equal to p\, and thus C'i,Cj can have at most log pi n many equal coordinates. This yields m = tq = 
0(e~ 2 log 2 n/(log 1/e + log logn)). We observe here that this bound is never optimal. A random code 
with q = 2/e and t = 0(e~ 1 logn) has the desired properties by applying the Chernoff bound on a pair 
of codewords, then a union bound over codewords (alternatively, such a code is promised by the Gilbcrt- 
Varshamov (GV) bound). If e is sufficiently small, a Reed-Solomon code performs even better. That is, we 
take a finite field ¥ q for q = 0(£ _1 logn/ (log logn + log(l/e))) and q = t, and each Cj corresponds to a 
distinct degree-d polynomial Pi over W q for d = 0(logn/ (loglogn + log(l/e))) (note there are at least q d > n 
such polynomials). We set (Cj)j = Pi(j). The relative distance is as desired since pi — Pj has at most d 
roots over ¥ q and thus can be at most d < et times. This yields qt = 0(£ _2 (logn/(loglogn + log(l/e)) 2 ), 
which surpasses the GV bound for e < 2 _r2 ^ los "\ and is always better than the Chinese remainder code. 
We note that this construction of a binary matrix based on Reed-Solomon codes is identical to one used by 
Kautz and Singleton in the different context of group testing [34] , 

In Figure [T] we elaborate on what known constructions of codes and JL matrices provide for us in terms of 
point query. In the case of running time for the Reed-Solomon construction, we use that degree-d polynomials 
can be evaluated on d + 1 points in a total of 0(d log 2 d log log d) field operations over ¥ q Ch. 10]. In 
the case of [3J, the constant 7 > can be chosen arbitrarily, and the constant in the big-Oh depends on I/7. 
We note that except in the case of Reed-Solomon codes, the construction of A is randomized (though once 
A is generated, incoherence can be verified in polynomial time, thus providing a poly(n)-time Las Vegas 
algorithm) . 

Note that Lemma[5]did not just give us error e||x||i, but actually gave us \xi — x' { \ < e||x_j||i, which is 
stronger. We now show that an even stronger guarantee is possible. We will show that in fact it is possible 
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to obtain ||a; — x'\\oo < £||xtaiz(i/e 2 )||i while increasing m by only an additive 0(s~ 2 log(e 2 n)), which is less 
than our original m except potentially in the Reed-Solomon construction. The idea is to, in parallel, recover 
a good approximation of Xjiead(i/e 2 ) with error proportional to ||xtaii(i/e 2 )l|i via compressed sensing, then to 
subtract from Ax before running our recovery procedure. We now give details. 

We in parallel run a k-sparse recovery algorithm which has the following guarantee: there is a pair 
(B,Ouf) such that for any x G R", we have that x' = Out'(Bx) G R™ satisfies ||a;'— ar|| 2 < 0(1/ Vk)\\x tmm \\i- 
Such a matrix B can be taken to have the restricted isometry property of order k (fc-RIP), i.e. that it preserves 
the £2 norm up to a small multiplicative constant factor for all /c-sparse vectors in R ra l^| It is known [27] that 
any such x 1 also satisfies the guarantee that ||a4 ead ( fc ) — x\\\ < 0{l)\\x ta u(k) ||i, where x' head ^ is the vector 
which agrees with x' on the top k coordinates in magnitude and is on the remaining coordinates. Moreover, 
it is also known [TU] that if B satisfies the JL lemma for a particular set of N = (en/k)°^ k > points in R", 
then B will be fc-RIP. The associated output procedure Out 1 takes Bx and outputs argmin 2 | B2 . =Sz ||z||i by 
solving a linear program [13]. All the JL matrices in Figure [1] provide this guarantee with 0(k log(en/fc)) 
rows, except for the last row which satisfies fc-RIP with 0(fcrog(en/fc) log 2 fclog(fclogn)) rows [42]. 

Theorem 5. Let A be an incoherent matrix A with error parameter e, and let B be k-RIP. Then there is 
an output procedure Out which for any x G R™, given only Ax,Bx, outputs a vector x' with \\x' — x||oo < 

£\\Xtail(k)\\l- 

Proof: Given Bx, we first run the fc-sparse recovery algorithm to obtain a vector y with ||x — £/ 1 1 1 = 
0(l)||x ta j2(jfc)||i. We then construct our output vector x' coordinate by coordinate. To construct x' {1 we 
replace yi with 0, obtaining the vector z l . Then we compute A(x — z l ) and run the point query output 
procedure associated with A and index i. The guarantee is that the output w l of the point query algorithm 
satisfies \w\ — (x — z l )i\ < e\\(x — z l )_i\\i, where 

\\(x - z l )-t\\i = ||(x-y)_i||i < = 0(l)\\x taa(k) || 1, 

and so \(w l + z l \ — xi\ = 0(e)\\x tai [^ ||i. If we define our output vector by x[ = w\ + z\ and rescale e by a 
constant factor, this proves the theorem. 

By setting k = 1/e 2 in Theorem [5] and stacking the rows of a fc-RIP and incoherent matrix each with 
0((logn)/e 2 ) rows, we obtain the following corollary, which says that by increasing the number of measure- 
ments m = 0(e~ 2 logn) by only a constant factor, we can obtain a stronger tail guarantee. 

Corollary 6. There is an mx n matrix A and associated output procedure Out which for any x G R", given 
Ax, outputs a vector x' with \\x' — £e||oo < sll^ta-sifi/e 2 ) II 1 • Here m = O ((log n)/s 2 ). 

Of course, again by using various choices of incoherent matrices and fc-RIP matrices, we can trade off 
the number of linear measurements for various tradeoffs in the running time and tail guarantee. It is also 
possible to obtain a tail-error guarantee for inner product. While this is implied black-box by reducing from 
point query with the fc-tail guarantee, by performing the argument from scratch we can obtain a better error 
guarantee involving mixed i\ and £2 norms. 

Theorem 7. Suppose 1/e 2 < n/2. There is an (A, Out) with A G IR mxn for m = 0(e~ 2 logn) such that 
for any x,y G M" , Out(Ax,Ay) gives an output which is (x,y) ± e(|jx||2||2/t a ii(i/e 2 ) II 1 + \\xtaii(i/e 2 ) II 1 \\vh) + 

£ 2 \\xt m l(l/e 2 )\\l\\Vtatl(l/e 2 )\\l- 

Proof: Using the £2^1 sparse recovery mentioned in Section^ we can recover x' , y' such that — a-'|| 2 < 
e \\ x taii(i / e 2 ) II 1 1 an d similarly for y — y'. The number of measurements is the number of measurements required 

1 Unfortunately currently the only known constructions of fc-RIP constructions with the values of m we discuss are Monte 
Carlo, forcing our algorithms in this section with the fc-tail guarantee to only be Monte Carlo polynomial time when constructing 
the measurement matrix. 
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for l/e 2 -PJP, which is 0(e 2 log(e 2 n)). Our estimation procedure Out simply outputs (x',y'). Then, 



\(x,y)-(x',y')\ = 



^2%i(yi - y'i) + v'i( x i - x 'i) 



< 



^2xi(yi - y'i) 



+ \y'i(xi-x'i)\ 



< \\xh\\y-y'h + \\y'h\\x-x'\\ 2 

< \\x\\a\\y- y'h + (h - y'h + \\vh)\\x - x'\\ 2 

The theorem then follows by our bounds on ||x — x'\\2 and \\y — y'lb- 

Note that again A, Out in Theorem [7] can be taken to be applied efficiently by using RIP matrices based 
on the Fast Johnson-Lindcnstrauss Transform. 



3 Lower Bound for £oo/£i Recovery 

Here we provide a lower bound for the point query problem addressed in Section [5] 

Theorem 8. Let < e < £q for some universal constant Sq < 1. Suppose 1/e 2 < n/2, and A is an m x n 
matrix for which given Ax it is always possible to produce a vector x' such that \x — a/||oo < e||xt a ii(fe)l|i- 
Then m — il(k log(n/fc) / log k + e~ 2 + e _1 log n). 

Proof: The lower bound of £l(e~ 2 ) for any k is already proven in |21) . 

The lower bound of f2(fclog(n/fc)/logfc + e~ 1 \ogn) follows from a standard volume argument. For 
completeness, we give the argument below. Let Bi(x, r) denote the l\ ball centered at x of radius r. We use 
the following lemma by Gilbert- Varshamov (see e.g. [§])■ 

Lemma 9 (0 Lemma 3.1]). For any q,k £ Z + ,e 6 R + with e < 1 — 1/q, there exists a set S C {0, l} qk of 
binary vectors with exactly k ones, such that S has minimum Hamming distance 2ek and 

log|5| > (l-H q (e))klogq 

where H q is the q-ary entropy function H q (x) = —x\og q — (1 — x) log 9 (l — x). 

Assume e < 1/200. Consider a set S of n dimensional binary vectors in R™ with exactly l/(5e) ones such 
that minimum Hamming distance between any two vectors in S is at least l/(10e). By the above lemma, 
we can get logl^l = log(en)). For any x £ S, and z £ B\{x, l/(200e)), we have ||^ ii(fc) ||i < IMIi < 

l/(5e) + l/(200e) = 41/(200e), z £ Bi(0, 41/(200e)), and there are at most 4/(200e) coordinates that are 
ones in x and smaller than 3/4 in z, and at most 4/(200e) coordinates that are zeros in a; and at least 1/4 
in z. If z' is a good approximation of z, then \\z' — z||oo < 41/200 < 1/4 so the indices of the coordinates of 
z' at least 1/2 differ from those of x at most 8/(200e) < l/(20e) places. Thus, for any two different vectors 
x,y £ S and z £ Bi(x, l/(200e)),t £ Bi(y, l/(200e)), the outputs for inputs z and t are different and hence, 
we must have Az ^ At. Notice that for the mapping x Ax, the image of Bi(x, l/(200e)) is the translated 
version of the image of -Bi(0,41/(200e)) scaled down in every dimension by a factor of 41. For x's in S, the 
images of B(x, l/(200e)) are disjoint subsets of the image of B(0, 41/(200£)). By comparing their volumes, 
we have 41 m > implying m = log(en)). 

Next, consider the set S' of all vectors in R n with exactly k coordinates equal to 1/k and the rest equal to 0. 
For any x £ S' , and z £ B\{x, l/(3fc)), we have \\z ta ii(k) II l < and z £ Bi(0, l + l/(3fc)) centered at the 

origin. Therefore, if z' is a good approximation of z, the indices of the largest k coordinates of z' are exactly 
the same as those of a;. Thus, for any two different vectors x,y £ S' and z £ Bi(x, l/(3k),t £ Bi(y, l/(3fc)), 
the outputs for inputs z and t are different and hence, we must have Az ^ At. Notice that for the mapping 
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x — > Ax, the image of B\(x, l/(3fc)) is the translated version of the image of -Bi(0, 1 + l/(3fc)) scaled down 
in every dimension by a factor of 3fc + 1. For cc's in S' , the images of B{x, l/(3fc)) are disjoint subsets of 
the image of B(0, 1 + l/(3fc)). By comparing their volumes, we have (3k + l) m > \S'\ > (n/k) k , implying 
m = rt(k log(n/k)/ log k). 



4 Lower Bounds for £\/£i recovery 

Recall in the ^ /^-recovery problem, we would like to design a matrix A £ R mx ™ such that for any x £ R™, 
given Ax we can recover x' £ R" such that \\x — x'\\i < (1 + s)\\x tai i^ \\\. We now show two lower bounds. 

Theorem 10. Let < e < l/\/8 be arbitrary, and k be an integer. Suppose k/e 2 < (n — l)/2. Then 
any matrix A £ W nxn which allows £i/£\-recovery with the k-tail guarantee with error e must have m > 
min{n/2, (l/16)fc/e 2 }. 

Proof: Without loss of generality we may assume that the rows of A are orthonormal. This is because 
first we can discard rows of A until the rows remaining form a basis for the rowspace of A. Call this new 
matrix with potentially fewer rows A'. Note that any dot products of rows of A with x that the recovery 
algorithm uses can be obtained by taking linear combinations of entries of A'x. Next, we can then find a 
matrix T £ R mxm so that TA' has orthonormal rows, and given TA'x we can recover A'x in post-processing 
by left- multiplication with T _1 . 

We henceforth assume that the rows of A are orthonormal. Since A ■ = 0, and our recovery procedure 
must in particular be accurate for x = 0, the recovery procedure must output x' = for any x £ ker(A). We 
consider x = (I — A T A)y for y = X)*=i a i e n(i)- Here 7r is a random permutation on n elements, and oi, . . . , 
are independent and uniform random variables in {—1, 1}. Since x £ her (A), which follows since AA T = 1 
by orthonormality of the rows of A, the recovery algorithm will output x' = 0. Nevertheless, we will show 
that unless m > min{n/2, (l/16)fc/e 2 }, we will have ||x||i > (1 + £)\\x ta ii(k) II l with positive probability so 
that by the probabilistic method there exists x £ ker(A) for which x' = is not a valid output. 

If m > n/2 we are done. Otherwise, since ||.r||i = \\xh ea( i{k) II l + ||2 ; tai;(fc)||ij it is equivalent to show that 
||^/iead(fc)|| l > £||xiozi(fc)||i with positive probability We first have 

E\\x ta u(k)\\i < E|M|i 

<E||y||i +£11^1/11! 

<(E||y|| 2 ) 1/2 + ^-(EP^y|| 2 ) 1/2 (2) 
= Vk + v 7 ^ ■ (E y T A T AA T Ay) 1/2 

= Vk + Vn~-(Ey T A T Ay) 1/2 (3) 

\ 1/2 

l/k k 

V \j=i j=i 

1/2 



0") III 



o =1 



= Vk + Vkn~-(E\\A 7T{1) \\ 2 2 ) 1 / 2 

= Vk + Vkm. (4) 

Eq. ([2]) uses Cauchy-Schwarz. Eq. ([3]) follows since A has orthonormal rows, so that AA T = I. Eq. (HJ uses 
that the sum of squared entries over all columns equals the sum of squared entries over rows, which is m 
since the rows have unit norm. 
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We now turn to lower bounding \\xh ea d(k) II i- Define r]ij = <Jj /&% so that for fixed i the rjij are independent 
and uniform ±1 random variables (except for 77^, which is 1). We have 

||a5fceod(fc)l|l ^ II ^^([fc]) ||i 



i=l 
k 

E 



Now, for fixed i G [k] we have 



E 



^ViJ ( A K(i)i Avti)) 
3=1 



< 



(5) 



1/2 



= \/fc- (E(A 7r(1) ,A T(2) ) 2 ) 



1/2 



< 



n(n — 1) 



|A T A|| 



k 



n(n — 1) 



(6) 



n(n — 1) 



< 



(7) 



Eq. © follows since ||A T A|||, = trace(A T yL4 T A) = tracc(A T A) = ||A||f,. Here || • || F denotes the Frobenius 



norm, i.e. 



151 



B 2 . 

1,3 S3 



Putting things together, by Eq. (U), a random vector x has |ja; tai ;( fc ) || i < 1\fk 
probability strictly larger than 1/2 by Markov's inequality. Also, call an i £ [k] bad if |x w (,)| < 1/2. 
Combining Eq. ((5]) with Eq. (0 and using a Markov bound we have that the expected number of bad indices 
i G [k] is less than fc/4. Thus the probability that a random a; has more than k/2 bad indices is at less than 1/2 
by Markov's inequality. Thus by a union bound, with probability strictly larger than 1 — (1/2) — (1/2) = 0, 
a random x taken as described simultaneously has ||a; ta jj(fc) ||i < 4-v/fcm and less than k/2 bad indices, 
the latter of which implies that ||ajft, e a<i(fe) 111 > kj2. Thus there exists a vector in x G kev(A) for which 
\\ x head(k) 111 > £ \\ x taii(k) II i when m < (l/16)fc/e 2 , and we thus must have m > (l/16)fc/e 2 . H 



We now give another lower bound via a different approach. As in [91 [41] , we use 2-party communication 
complexity to prove an Q((k/e) \og(en/k)) bound on the number of rows of any £i/£i sparse recovery scheme. 
The main difference from prior work is that we use deterministic communication complexity and a different 
communication problem. 

We give a brief overview of the concepts from communication complexity that we need, referring the 
reader to [37| for further details. Formally, in the 1-way deterministic 2-party communication complexity 
model, there are two parties, Alice and Bob, holding inputs x,y G {0, l} r , respectively. The goal is to 
compute a Boolean function f(x,y). A single message m{x) is sent from Alice to Bob, who then outputs 
g(m(x),y) for a Boolean function g. The protocol is correct if g(m(x),y) = f(x,y) for all inputs x and 
y. The 1-way deterministic communication complexity of /, denoted D 1 ^ way (/), is the minimum over all 
correct protocols, of the maximum message length |m(ac)| over all inputs x. 
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We use the EQ(x, y) : {0, l} r x {0, l} r — > {0, 1} function, which is 1 if x = y and otherwise. It is known 
[37] that D 1 ~ way (EQ) = r. We show how to use a pair (A, Out) with the property that for all vectors z, 
the output z' of Out(Az) satisfies \\z — z'\\i < (1 + £)||ztati(fc)||i) to construct a correct protocol for E'Q on 
strings x,y € {0, l} r for r = 0((fc/e) lognlog(£n/fc)). We then show how this implies the number of rows of 
A is fi((jfe/e) log(en/ft)). 

We can assume the rows of A are orthonormal as in the beginning of the proof of Theorem [TO] Let A' 
be the matrix where we round each entry of A to b = 0(logn) bits per entry. We use the following Lemma 
of ®. 

Lemma 11. (Lemma 5.1 of J3j) Consider any m x n matrix A with orthonormal rows. Let A' be the result 
of rounding A to b bits per entry. Then for any v <E R" there exists an s € K" with A'v = A{v — s) and 

Nli <n 2 2" 6 Hi. 

Theorem 12. Any matrix A which allows l\j i\-recovery with the k-tail guarantee with error e satisfies 
m = f2((fc/e) \og(en/k)). 

Proof: Let S be the set of all strings in {0,ce/fc}™ containing exactly k/{ce) entries equal to ce/k, for an 
absolute constant c > specified below. Observe that log \S\ = Q((k/e) \og(en/k)). 

In the EQ(x,y) problem, Alice is given a string x of length r = logn • log Alice splits x into logn 
contiguous chunks x 1 , . . . , x l ° sn , each containing r/ logn bits. She uses x % as an index to choose an element 
of S. She sets 

log n 

u=J2 2 V, 

i=i 

and transmits A'u to Bob. 

Bob is given a string y of length r in the EQ(x,y) problem. He performs the same procedure as Alice, 
namely, he splits y into log n contiguous chunks y 1 , . . . , y log n , each containing r / log n bits. He uses y % as an 
index to choose an element of 5*. He sets 

log n 

v = £ 2y. 

i=i 

Given A'u, he outputs A'(u — v), which by applying Lemma 1111 once to Au and once to Av, is equal to 
A{u — v — s) for an s with ||s||i < n 2 2 _b (||u||i + ||u||i) < 1/n, where the last inequality follows for sufficiently 
large b = O(logn). If A'(u — v) = 0, he outputs that x and y are equal, otherwise he outputs that x and y 
are not equal. 

Observe that if x = y, then u = v, and so Bob outputs the correct answer. Next, we consider x ^ y, and 
show that A'(u — v) ^ 0. To do this, it suffices to show that ||(u — v — s)h ea <i(/c)||i > z\\ u ~ v ~ s l|ii as then 
Out(A(u — v — s)) could not output 0, which would also mean that A'(u — v) ^ 0. 

To show that \\(u — v — s)head(k) II l > £\\ u — v — s\\i, first observe that ||s||i < 1/n, so by the triangle 
inequality, it is enough to show that \\(u — v)h ea d(k) II l > 2e||u — v\\i. 

Let z 1 = u — v. Let i e [logn] be the largest index of a chunk for which x % ^ y l , and let ji be such that 
|4| - Hz^loo. Then \z) x \ = ce ■ 2*/fc, while 

ll^lli <2-2 + 2- 4 + 2- 8 + --- + 2-2 ? <2 - 2 i+1 = 2 i+2 . 

Let z 2 be z 1 with coordinate ji removed. Repeating this argument on z 2 , we again find a coordinate j2 with 
\zj 2 \ > H • || z 2 ||i. It follows by induction that after k steps, and for e > less than an absolute constant 
£o > 0, 




and so 

|| (u - u)heod(jfe)||i > ce\\u - v\\i. 
Setting c = 2, we have that ||(w— w)^eati(fc) 111 > 2e||u— v\\i, as desired. 
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Finally, observe the communication of this protocol is the number of rows of A times O(logn), since this 
is the number of bits required to specify m(x) = A'u. It follows by the communication lower bound for EQ, 
that the number of rows of A is Q(r/\ogn) = Cl((k/s) log(sn/k)). This proves our theorem. I 

5 Deterministic Norm Estimation and the Gelfand Width 

Theorem 13. For 1 < p < q < oo, let m be the minimum number such that there is an n — m dimensional 
subspace S of R" satisfying swp veS jj^jp < e. Then there is an m x n matrix A and associated output 

procedure Out which for any x € R™, given Ax, outputs an estimate of \\v\\ q with additive error at most 
e\\v\\ p . Moreover, any matrix A with fewer rows will fail to perform the same task. 

Proof: Consider a matrix A whose kernel is such a subspace. For any sketch z, we need to return a number 
in the range — e||a;|| p , ||x|| g + e||a;|| p ] for any x satisfying Ax — z. Assume for contradiction that it is 

not possible. Then there exist x and y such that Ax = Ay but ||a;|| g — £||a;|| p > \\y\\q + s\\y\\ P - However, since 
x — y is in the kernel of A, 

N|, -\\y\\ q <\\x- y\U <e\\x- y\\ P < e(\\x\\ p + \\y\\ p ) 

Thus, we have a contradiction. The above argument also shows that given the sketch z, the output 
procedure can return XQxa x -Ax=z \\%\\q + ^ll^llp- This is a convex optimization problem that can be solved 
using the ellipsoid algorithm. Below we give the details of the algorithm for finding al+£ approximation 
of OPT. 

Let y = A T {AA T ) _1 z. Then Ay = z = Ax, y is the projection of x on the space spanned by the rows of 
A, and thus y is the vector of minimum £2 norm satisfying Ay = z. We have for any x satisfying Ax = z, 

n- 1/2 \\yh<n- 1 / 2 \\x\\ 2 <\\x\\ q <OPT= min \\x\\ q + e\\x\\ p < \\y\\ q + e\\y\\ p < {1 + e)V^\\yh (8) 

x: Ax — z 

The value ||y||2 can be computed from the sketch z, and we use this value to find OPT using binary search. 
Specifically, in each step we use the ellipsoid algorithm to solve the feasibility problem ||x||g+e||a;|| p < M 
on the affine subspace Ax = z. Recall that when solving feasibility problems, the ellipsoid algorithm takes 
time polynomial in the dimension, the running time of a separation oracle, and the logarithm of the ratio of 
volumes of an initial ellipsoid containing a feasible point and the volume of the intersection of that ellipsoid 
with the feasible set. Let x* be the optimal solution of the minimization problem. If M > (1 + e)OPT then 

by the triangle inequality every point in the £2 ball centered at x* of radius en ± ^^ 2 is feasible. Furthermore, 
by Eq. ([5]) the set of feasible solutions is contained in the intersection of the £2 ball about the origin of radius 
(1 + e)n||?/||2 and the affine subspace (or cquivalently, the £2 ball about y of radius + e) 2 n 2 — 1 1 1 3/ 1 1 2 and 
the affine subspace). Thus, the ellipsoid algorithm runs in time polynomial in n and log(l/e) assuming a 
polynomial time separation oracle. 

Now we describe the separation oracle. Consider a point x such that ||x|| g + £||x|| p > M. We want to 
find a hyperplane separating x and {y|||y|| q + e\\y\\ P < M}. Without loss of generality assume that Xi > 
for all i. Define f x , P ,i as follows: 

if p < 00 

if p = 00 and Xi = maxj Xj and k = \\t\xt = maxj Xj}\ ■ 
if p = 00 and Xi < maxj Xj 

The hyperplane we consider is h ■ y = h • x where hi = f x ,q,i + £fx, P ,i- 
Lemma 14. If h ■ y > h ■ x then \\y\\ q + e\\y\\ P > ||x|| g + £||y|| p . 

Proof: For any y, consider y' such that y[ = \y^\. We have ||y'|| q + £||j/'|| P = \\y\\q + £ \\y\\p and h-y' > h- y. 
Thus, we only need to prove the claim for y such that j/i > Vi. 



fx, P ,', 
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If p < oo then by Holder's inequality, 

Nip ■ iMr 1 = iiviip • K^Up/ip-D > E^r 1 - 

i 

If p = oo then HvHoo > nl „ x , W /fc. 

In either case, \\y\\ p > Vifx,p,i, and the same inequality holds for p replaced with g. Thus, 

\\y\\q + £ \\v\\p >yh>x-h= \\x\\ q + e\\x\\ p . 



By the above lemma, h separates x and the set of feasible solutions. This concludes the description of 
the algorithm. 

For the lower bound, consider a matrix A with fewer than m rows. Then in the kernel of A, there exists 
v such that \\v\\ q > e\\v\\, p . Both v and the zero vector give the same sketch (a zero vector). However, by 
the stated requirement, we need to output for the zero vector but some positive number for v. Thus, no 
matrix A with fewer than m rows can solve the problem. 

The subspace S of highest dimension of K™ satisfying sup„ e s Mp < e is related to the Gclfand width, a 
well-studied notion in functional analysis. 

Definition 15. Fix p < q. The Gelfand width of order m of £ p and l q unit balls in R™ is defined as 

'ML 



inf sup ■ 



subspace A:codim(A)—m V £A ||^||p 

Using known bounds for the Gelfand width for p = 1 and q = 2, we get the following corollary. 

Corollary 16. Assume that 1/e 2 < n/2. There is an m x n matrix A and associated output procedure Out 
which for any x € K™, given Ax, outputs an estimate e such that \\x\\2 — ell^lli < e < ||x||2 +e||a;||i. Here 
m = 0(e~ 2 log(e 2 n)) and this bound for m is tight. 

Proof: The corollary follows from the following bound on the Gclfand width by Foucart et al. [20] and 
Garnacv and Gluskin 1241: 



inf supS=eh 1+1 ° g( " /m) 

subspace A:codiva{A)—m V (zj\ fl n \ V 771 



Acknowledgments 

We thank Raghu Meka for answering several questions about almost fc-wise independent sample spaces. We 
thank an anonymous reviewer for pointing out the connection between incoherent matrices and e-biased 
spaces, which are used to construct almost k-wise independent sample spaces. 



References 

[1] Dimitris Achlioptas. Database-friendly random projections: Johnson-Lindcnstrauss with binary coins. 
J. Comput. Syst. Scl, 66(4):671-687, 2003. 

[2] Nir Ailon and Bernard Chazelle. The fast Johnson-Lindenstrauss transform and approximate nearest 
neighbors. SI AM J. Comput, 39(l):302-322, 2009. 



13 



[3] Nir Ailon and Edo Liberty. Fast dimension reduction using Rademacher series on dual BCH codes. 
Discrete & Computational Geometry, 42(4):615-630, 2009. 

[4] Nir Ailon and Edo Liberty. Almost optimal unrestricted fast Johnson-Lindenstrauss transform. In 
Proceedings of the 22nd Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 185- 
191, 2011. 

[5] Noga Alon. Problems and results in extremal combinatorics - I. Discrete Mathematics, 273(l-3):31-53, 
2003. 

[6] Noga Alon. Perturbed identity matrices have high rank: Proof and applications. Combinatorics, 
Probability & Computing, 18(1-2) :3-15, 2009. 

[7] Noga Alon, Oded Goldrcich, Johan Hastad, and Rene Pcralta. Simple construction of almost k-wise 
independent random variables. Random Struct. Algorithms, 3(3):289-304, 1992. 

[8] Noga Alon, Yossi Matias, and Mario Szegedy. The Space Complexity of Approximating the Frequency 
Moments. JCSS, 58(1): 137-147, 1999. 

[9] Khanh Do Ba, Piotr Indyk, Eric Price, and David P. Woodruff. Lower bounds for sparse recovery. In 
SODA, pages 1190-1197, 2010. 

[10] Richard Baraniuk, Mark A. Davenport, Ronald DcVore, and Michael Wakin. A simple proof of the 
Restricted Isometry Property. Constructive Approximation, 28(3):253-263, 2008. 

[11] Daniel Barbara, Ningning Wu, and Sushil Jajodia. Detecting novel network intrusions using Bayes 
estimators. In Proceedings of the 1st SIAM International Conference on Data Mining, 2001. 

[12] Avraham Ben-Aroya and Amnon Ta-Shma. Constructing small-bias sets from algebraic-geometric codes. 
In FOCS, pages 191-197, 2009. 

[13] Emmanuel Candes, Justin Romberg, and Terence Tao. Robust uncertainty principles: Exact sig- 
nal reconstruction from highly incomplete frequency information. IEEE Trans. Information Theory, 
52(2):489-509, 2006. 

[14] Moses Charikar, Kevin Chen, and Martin Farach-Colton. Finding frequent items in data streams. Theor. 
Comput. Sci, 312(1):3-15, 2004. 

[15] Albert Cohen, Wolfgang Dahmen, and Ronald A. DcVore. Compressed sensing and best k-term ap- 
proximation. J. Amer. Math. Soc, 22:211-231, 2009. 

[16] Graham Cormode and S. Muthukrishnan. An improved data stream summary: the count-min sketch 
and its applications. J. Algorithms, 55(l):58-75, 2005. 

[17] Graham Cormode and S. Muthukrishnan. What's hot and what's not: tracking most frequent items 
dynamically. ACM Trans. Database Syst., 30(l):249-278, 2005. 

[18] Erik D. Demaine, Alejandro Lopez-Ortiz, and J. Ian Munro. Frequency estimation of Internet packet 
streams with limited space. In ESA, pages 348-360, 2002. 

[19] David L. Donoho and Xiaoming Huo. Uncertainty principles and ideal atomic decomposition. IEEE 
Trans. Inform. Th., 47:2558-2567, 2001. 

[20] Simon Foucart, Alain Pajor, Holgcr Rauhut, and Tino Ullrich. The Gelfand widths of £ p -balls for 
< p < 1. Journal of Complexity, 26(6):629-640, 2010. 

[21] Sumit Ganguly. Lower bounds on frequency estimation of data streams. In CSR, pages 204-215, 2008. 
Full version at http://www.cse.iitk.ac.in/users/sganguly/csr-full.pdf. 



14 



[22] Sumit Ganguly. Deterministically estimating data stream frequencies. In COCOA, pages 301-312, 2009. 

[23] Sumit Ganguly and Anirban Majumdcr. CR-precis: A deterministic summary structure for update data 
streams. In ESCAPE, pages 48-59, 2007. 

[24] Andrej Y. Garnaev and Efim D. Gluskin. On the widths of the Euclidean ball. Soviet Mathematics 
Doklady, 30:200-203, 1984. 

[25] Anna C. Gilbert, Yannis Kotidis, S. Muthukrishnan, and Martin J. Strauss. Quicksand: Quick summary 
and analysis of network data. DIMACS Technical Report 2001-43, 2001. 

[26] Anna C. Gilbert, S. Muthukrishnan, and Martin Strauss. Approximation of functions over redundant 
dictionaries using coherence. In SODA, pages 243-252, 2003. 

[27] Anna C. Gilbert, Martin J. Strauss, Joel A. Tropp, and Roman Vershynin. One sketch for all: fast 
algorithms for compressed sensing. In STOC, pages 237-246, 2007. 

[28] Ehm D. Gluskin. On some finite-dimensional problems in the theory of widths. Vestn. Leningr. Univ. 
Math., 14:163-170, 1982. 

[29] Piotr Indyk and Milan Ruzic. Near-optimal sparse recovery in the L\ norm. In FOCS, pages 199-207, 
2008. 

[30] William B. Johnson and Joram Lindenstrauss. Extensions of Lipschitz mappings into a Hilbert space. 
Contemporary Mathematics, 26:189-206, 1984. 

[31] Hossein Jowhari, Mcrt Saglam, and Gabor Tardos. Tight bounds for L p samplers, finding duplicates in 
streams, and related problems. In PODS, pages 49-58, 2011. 

[32] Daniel M. Kane and Jelani Nelson. Sparser Johnson-Lindenstrauss transforms. In SODA, pages 1195- 
1206, 2012. 

[33] Richard M. Karp, Scott Shcnkcr, and Christos H. Papadimitriou. A simple algorithm for finding frequent 
elements in streams and bags. ACM Trans. Database Syst., 28:51-55, 2003. 

[34] William H. Kautz and Richard C. Singleton. Nonrandom binary superimposed codes. IEEE Trans. Inf. 
Theory, 10:363-377, 1964. 

[35] Felix Krahmer and Rachel Ward. New and improved Johnson-Lindenstrauss embeddings via the Re- 
stricted Isometry Property. SIAM J. Math. Anal., 43(3):1269-1281, 2011. 

[36] Hari Krishna, Bal Krishna, Kuo-Yu Lin, and Jenn-Dong Sun. Computational Number Theory and 
Digital Signal Processing: Fast Algorithms and Error Control Techniques. CRC, Boca Raton, FL, 1994. 

[37] Eyal Kushilevitz and Noam Nisan. Communication complexity. Cambridge University Press, 1997. 

[38] Stephane G. Mallat and Zhifcng Zhang. Matching pursuits with time-frequency dictionaries. IEEE 
Trans. Signal Process., 41(12):3397-3415, 1993. 

[39] Jayadev Misra and David Gries. Finding repeated elements. Sci. Comput. Program., 2(2):143-152, 
1982. 

[40] Joseph Naor and Moni Naor. Small-bias probability spaces: Efficient constructions and applications. 
SIAM J. Comput, 22(4):838-856, 1993. 

[41] Eric Price and David P. Woodruff. (1 + eps)-approximate sparse recovery. In FOCS, pages 295-304, 
2011. 



15 



[42] Mark Rudelson and Roman Vershynin. On sparse reconstruction from Fourier and Gaussian measure- 
ments. Communications on Pure and Applied Mathematics, 61:1025-1045, 2008. 

[43] D. Sivakumar. Algorithmic derandomization via complexity theory. In STOC, pages 619-626, 2002. 

[44] Michael A. Soderstrand, W. Kenneth Jenkins, Graham A. Jullien, and Fred J. Taylor. Residue Number 
System Arithmetic: Modern Applications in Digital Signal Processing. IEEE Press, New York, 1986. 

[45] Joachim von zur Gathen and Jiirgen Gerhard. Modern Computer Algebra. Cambridge University Press, 
1999. 

[46] Richard W. Watson and Charles W. Hastings. Self-checked computation using residue arithmetic. Proc. 
IEEE, 4(12):1920-1931, 1966. 



1G 



